Supplementary Material for Multilevel Clustering via Wasserstein Means

نویسندگان

  • Nhat Ho
  • XuanLong Nguyen
  • Mikhail Yurochkin
  • Hai Bui
  • Dinh Phung
چکیده

i,j ∈ Rk×k ′ + is the cost matrix, i.e. matrix of pairwise distances of elements betweenG andG′, and 〈A,B〉 = tr(AB) is the Frobenius dot-product of matrices. The optimal T ∈ Π(G,G′) in optimization problem (1) is called the optimal coupling ofG andG′, representing the optimal transport between these two measures. When k = k′, the complexity of best algorithms for finding the optimal transport is O(k log k). Currently, (Cuturi, 2013) proposed a regularized version of (1) based on Sinkhorn distance where the complexity of finding an approximation of the optimal transport is O(k). Due to its favorably fast

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilevel Clustering via Wasserstein Means

We propose a novel approach to the problem of multilevel clustering, which aims to simultaneously partition data in each group and discover grouping patterns among groups in a potentially large hierarchically structured corpus of data. Our method involves a joint optimization formulation over several spaces of discrete probability measures, which are endowed with Wasserstein distance metrics. W...

متن کامل

Wasserstein k-means++ for Cloud Regime Histogram Clustering

Much work has sought to discern the different types of cloud regimes, typically via Euclidean k-means clustering of histograms. However, these methods ignore the underlying similarity structure of cloud types. Wasserstein k-means clustering is a promising candidate for utilizing this structure during clustering, but existing algorithms do not scale well and lack the quality guarantees of the Eu...

متن کامل

Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances

This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environme...

متن کامل

Supplementary Material ofDifferentially Private Clustering in High-Dimensional Euclidean Spaces

Non-Private Clustering: There is a wide range of prior work on the problem of center-based clustering in the absence of privacy requirement. It is known that exact optimization of objective function in R is not computationally possible (Dasgupta, 2008) even for the problem of 2-means clustering. To avoid the computational obstacle, several approximation algorithms have been developed, e.g., by ...

متن کامل

Supplementary Material for Bayesian Nonparametric Multilevel Clustering with Contexts

Vu Nguyen†, Dinh Phung†, XuanLong Nguyen‡, S. Venkatesh†, and Hung Bui∗ †Centre for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Australia. {tvnguye,dinh.phung,svetha.venkatesh}@deakin.edu.au ‡Department of Statistics, Dept of Electrical Engineering and Computer Science University of Michigan. [email protected] ∗Laboratory for Natural Language Understanding, Nuance Commun...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017